AITopics | Query Processing

Collaborating Authors

Query Processing

News Overviews Instructional Materials AI-Alerts Classics

83fc8fab1710363050bbd1d4b8cc0021-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMar-27-2025, 09:09:11 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia > China > Guangdong Province (0.28)
North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.93)
(4 more...)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
(3 more...)

Add feedback

SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark

Neural Information Processing SystemsMar-26-2025, 10:05:46 GMT

Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far. In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy--a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation across four popular query languages, namely SQL, MQL, Cypher, and SPARQL. We systematically and manually develop 408 template questions, which we augment to construct a benchmark of 10K diverse natural language question/query pairs for these four query languages (40K pairs overall). On our dataset, we evaluate several common in-context-learning (ICL) approaches for a set of representative closed and open-source LLMs.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe (0.67)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Health Care Providers & Services (0.92)
Health & Medicine > Health Care Technology > Medical Record (0.86)
Government > Regional Government > North America Government > United States Government (0.67)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Query Complexity of Bayesian Private Learning

Kuang Xu

Neural Information Processing SystemsMar-26-2025, 09:22:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.46)

Industry:

Education (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.46)

Add feedback

Scalable and Efficient Non-adaptive Deterministic Group Testing

Neural Information Processing SystemsMar-23-2025, 02:55:34 GMT

Group Testing (GT) is about learning a (hidden) subset K, of size k, of some large domain N, of size n k, using a sequence of queries. A result of a query provides some information about the intersection of the query with the unknown set K. The goal is to design efficient (polynomial time) and scalable (polylogarithmic number of queries per element in K) algorithms for constructing queries that allow to decode every hidden set K based on the results of the queries. A vast majority of the previous work focused on randomized algorithms minimizing the number of queries; however, in case of large domains N, randomization may result in a significant deviation from the expected precision of learning the set K. Others assumed unlimited computational power (existential results) or adaptiveness of queries (next query could be constructed taking into account the results of the previous queries) - the former approach is less practical due to non-efficiency, and the latter has several drawbacks including non-parallelization. To avoid all the abovementioned drawbacks, for Quantitative Group Testing (QGT) where query result is the size of its intersection with the hidden set, we present the first efficient and scalable non-adaptive deterministic algorithms for constructing queries and decoding a hidden set K from the results of the queries - these solutions do not use any randomization, adaptiveness or unlimited computational power.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.67)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

Exploratory Retrieval-Augmented Planning For Continual Embodied Instruction Following, Wei-jin Park 2

Neural Information Processing SystemsMar-22-2025, 22:34:06 GMT

This study presents an Exploratory Retrieval-Augmented Planning (ExRAP) framework, designed to tackle continual instruction following tasks of embodied agents in dynamic, non-stationary environments. The framework enhances Large Language Models' (LLMs) embodied reasoning capabilities by efficiently exploring the physical environment and establishing the environmental context memory, thereby effectively grounding the task planning process in time-varying environment contexts. In ExRAP, given multiple continual instruction following tasks, each instruction is decomposed into queries on the environmental context memory and task executions conditioned on the query results. To efficiently handle these multiple tasks that are performed continuously and simultaneously, we implement an exploration-integrated task planning scheme by incorporating the informationbased exploration into the LLM-based planning process. Combined with memoryaugmented query evaluation, this integrated scheme not only allows for a better balance between the validity of the environmental context memory and the load of environment exploration, but also improves overall task performance. Furthermore, we devise a temporal consistency refinement scheme for query evaluation to address the inherent decay of knowledge in the memory. Through experiments with VirtualHome, ALFRED, and CARLA, our approach demonstrates robustness against a variety of embodied instruction following scenarios involving different instruction scales and types, and non-stationarity degrees, and it consistently outperforms other state-of-the-art LLM-based task planning approaches in terms of both goal success rate and execution efficiency.

artificial intelligence, large language model, natural language, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Efficient Pure Exploration in Adaptive Round model

Tianyuan Jin, Jieming SHI, Xiaokui Xiao, Enhong Chen

Neural Information Processing SystemsMar-22-2025, 15:06:01 GMT

In the adaptive setting, many multi-armed bandit applications allow the learner to adaptively draw samples and adjust sampling strategy in rounds. In many real applications, not only the query complexity but also the round complexity need to be optimized. In this paper, we study both PAC and exact top-k arm identification problems and design efficient algorithms considering both round complexity and query complexity.

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.59)
Information Technology > Data Science > Data Mining > Big Data (0.50)

Add feedback

On Margin-Based Cluster Recovery with Oracle Queries

Neural Information Processing SystemsMar-22-2025, 03:39:29 GMT

We study an active cluster recovery problem where, given a set of n points and an oracle answering queries like "are these two points in the same cluster?", the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous works, the classic SVM margin, and standard notions of stability for center-based clusterings. Under our margin assumptions we design algorithms that, in a variety of settings, recover all clusters exactly using only O(log n) queries.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Appendices Appendices 1 A Basic Facts about Gaussian Distributions 2 B An Improved Analysis of NA-Hutch+ + 3 C Lower Bounds 8 C.1 Case 1: Lower Bound for Small ɛ

Neural Information Processing SystemsMar-21-2025, 18:42:59 GMT

We let G N (n) denote an n n random Gaussian matrix with i.i.d. Then Rg has the same distribution as g. Recall that NA-Hutch++ splits its matrix-vector queries between computing an O(1)- approximate rank-k approximation Ã and performing Hutchinson's estimate on the residual matrix A Ã. The key to an improved query complexity of NA-Hutch++ is on the analysis of the size of random Gaussian sketching matrices S, R in Algorithm 1 that one needs to get an O(1)-approximate rank-k approximation Ã in the Frobenius norm. To get the desired rank-k approximation, we need S and R to satisfy two properties: 1) subspace embedding as in Lemma 3.3 and 2) approximate matrix product for orthogonal subspaces as in Lemma 3.4. Specifically, we show in Lemma 3.4 that choosing S and R to be of size O(k + log(1/δ)) suffices to get the second property with probability 1 δ.

artificial intelligence, information retrieval query processing, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Contextual Active Model Selection, Rick L. Stevens 1,2

Neural Information Processing SystemsMar-20-2025, 19:44:02 GMT

While training models and labeling data are resource-intensive, a wealth of pretrained models and unlabeled data exists. To effectively utilize these resources, we present an approach to actively select pre-trained models while minimizing labeling costs. We frame this as an online contextual active model selection problem: At each round, the learner receives an unlabeled data point as a context. The objective is to adaptively select the best model to make a prediction while limiting label requests. To tackle this problem, we propose CAMS, a contextual active model selection algorithm that relies on two novel components: (1) a contextual model selection mechanism, which leverages context information to make informed decisions about which model is likely to perform best for a given context, and (2) an active query component, which strategically chooses when to request labels for data points, minimizing the overall labeling cost. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Furthermore, we demonstrate the effectiveness of our algorithm on a diverse collection of benchmark classification tasks. Notably, CAMS requires substantially less labeling effort (less than 10%) compared to existing methods on CIFAR10 and DRIFT benchmarks, while achieving similar or better accuracy.

classifier, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Energy (1.00)
Health & Medicine > Therapeutic Area (0.73)
Education (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Neural Information Processing SystemsMar-20-2025, 02:39:32 GMT

In the first-order query model for zero-sum K K matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent.

machine learning, natural language, query complexity, (16 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology: